Kaniadakis’s Information Geometry of Compositional Data

We propose to use a particular case of Kaniadakis’ logarithm for the exploratory analysis of compositional data following the Aitchison approach. The affine information geometry derived from Kaniadakis’ logarithm provides a consistent setup for the geometric analysis of compositional data. Moreover, the affine setup suggests a rationale for choosing a specific divergence, which we name the Kaniadakis divergence.


Introduction
This paper describes Kaniadakis' statistics as a methodology in data science. Precisely, we discuss Kaniadakis' formalism for defining an affine structure on the open probability simplex. We present the methods in some generality and use them for the exploratory analysis of compositional data. The illustrating example is a small dataset, and we do not discuss any scaling issues of our methods. However, the dataset has an independent interest in financial risk analysis.

Why a Geometric Methodology
Kaniadakis' logarithm [1,2] generalises the ordinary logarithm in a way that supports the development of deformed exponential families, deformed statistical divergences, and deformed entropy. Kaniadakis was originally motivated by the applications to nonextensive statistical physics in the sense of [3,4]. In this paper, we present the geometry of the probability simplex as a system of two affine spaces in duality from the perspective of information geometry (IG) [5]. The affine setup was first applied to deformed statistical models in [6].
The systematic use of this formal geometric perspective provides a robust and unified rationale for discussing key descriptive concepts. Defining geometry is much more than providing a topology or a distance. We provide a definition of affine geodesics and a natural duality so that the orthogonal surfaces of the geodesics are well-defined by a specific divergence function. The divergence level sets form a neighbourhood system and, eventually, a topology. In this setup, we define the barycentre, the displacement from the barycentre, and dimensionality reduction. For the standard affine geometry of the probability simplex, see, for example, the tutorial reference [7]. We use a special kind of Kaniadakis' logarithm that appears with a different name in compositional data (CoDa) ( [8] Example 4.20).

CoDa
Compositional data (or CoDa) are the data where all of a (row) vector's (i.e., [x 1 , x 2 , . . . , x D ]) components are strictly positive real values, can also have zero values, and thus contain solely relative information; the composition is called a D-part composition. Compositional

CoDa and Systemic Financial Risk
The Center for Risk Management at the University of Lausanne (http://www.crml.ch, accessed on 28 June 2023) provides systemic risk assessments for European financial institutions, which we used in our empirical study using the above Kaniadakis methods. The dataset enables the determination of SRISK country-level values, a market-based systemic risk indicator first proposed in [16,17] and most recently examined in [18].
The characteristics of SRISK are popular in the literature, and SRISK is mainly used to recognize weak institutions and countries with a system-wide impact before a crisis occurs [19] and can help forecast actual sector performance [20]. Most of the previous literature has mainly focused on the absolute values of SRISK. In this work, we focus on implementing the Kaniadakis methods to see the different European countries as a part of compositional data. We developed work started in [12], where they first introduced compositional data analysis to examine the distribution of relative contributions to SRISK connected with key European nations from 2008 to 2021.
Atchison [21] first introduced CoDa analysis. The research conducted by the [12] on financial data used the Atchison methods to examine how European nations contribute to the total amount of systemic risk (SRISK). They find that the distinctive quality of CoDa analysis, especially the Atchison geometry, is very effective in determining the threats of possible instability offered by smaller institutions and nations that might not completely emerge from the scale of their systemic risk.

Data and Methods
This paper first establishes a novel theoretical framework for compositional data using Kaniadakis' logarithm. Second, we implement the Kaniadakis divergence on the compositional data and calculate the exponential and mixture displacements on compositional data. Next, we calculate the barycenter and deviation. The purpose of calculating the barycenter is to check how far the values of SRISK are from their centre value.
We consider ten European economies (Belgium, Denmark, France, Germany, Greece, Italy, Netherlands, Spain, Switzerland, and the UK) with annual SRISK measurements collected at the end of December for 2008-2021. Every number is stated in billions of Euros. Like most CoDa method applications, the sample does not cover Europe. Therefore, the ten components that make up our SRISK compositions are just a portion of all those that may be used. CoDa analysis, however, is predicated on the basic notion of sub-compositional coherence, which ensures that a compositional study conducted on a subset of components is consistent with the same analysis performed on the entire composition.

Kaniadakis' Logarithm
We summarize the particular case of Kaniadakis' logarithm with a purely algebraic form. In the suggestive formalism introduced by [22], the generalised logarithms are associated with the reciprocal derivative function A Notice that the growth is linear in both directions. Notice that the above equation reduces any polynomial in y and x = exp κ (y), for example: and so on. This is an algebraic feature, and this theory is a case of algebraic statistics [23].

Kaniadakis' Exponential form of a Positive Probability Function
If the sample space Ω is a finite set, then the probability simplex on Ω is P (Ω), and the open probability simplex is E (Ω).
For all p ∈ E (Ω), the function A • p, A as in Equation (1), is strictly positive and provides a positive weight on Ω. It is proportional, but usually not equal, to a probability function, .
We will also write is called the escort mapping; see [22]. See ([24] §3.1) for a discussion of its injectivity and surjectivity. We introduce a notation for the escort expectation, For p, q ∈ E (Ω), the Kaniadakis divergence can be defined by changes in the usual definition of the logarithm to the Kaniadakis logarithm and the probability function p with the escort p: Clearly, D(p|p) = 0. If p = q, from the concavity in Equation (3), then, for u = s p (q), Conversely, for all p ∈ E (Ω), if u is a random variable such that E p [u] = 0, the real function is continuous, goes to 0 as ψ → ∞, and, for ψ = 0, takes a value larger than 1 because of Equation (4): In conclusion, there exists a function, Hence, we have and the mapping

Properties of the Cumulant Function K p
Let us compute the derivatives of the function K p . We use a square bracket notation for the direction From Equation (7), It follows that, for each p ∈ E (Ω) and u, h ∈ S p , it holds where q = e p (u); see Equation (7).
unless the previous conditions hold true.

Bibliographical Notes
Similarly, d 2 K p and the convex conjugate of K p can be computed. See below for the duality and see also [7,25]. Kaniadakis logarithm and exponential were first introduced in [26,27]. The application to IG used here appeared in [6,24,28]. These papers discuss both the finite state space and the general state space.

Affine Space
The Kaniadakis non-parametric affine geometry of the open probability simplex is a variation of the standard case [7]. The main difference is the substitution of the expectation with the escort expectation.

Statistical Bundle
The statistical bundle is an expression of the tangent space of E (Ω) as a dually flat affine statistical manifold in the sense of [5]. The statistical bundle S E (Ω) and each fiber S q E (Ω) are defined by In our setup, each fibre is a finite-dimensional vector space and can be identified with its dual. However, it is convenient to distinguish the two statistical bundles. The previous one is called exponential statistical bundle, while the mixture statistical bundle is For each couple p, q ∈ E (Ω), the mapping e U q p : is a bijection. The e U p p is the identity mapping, and e U r q e U q p = e U r p .
The co-cycle of mappings ( e U q p ) p,q is the exponential parallel transport of the exponential statistical bundle.
The mapping defined for all p ∈ E (Ω), v ∈ * S p E (Ω), and w ∈ S p E (Ω) by provides a duality between the fibres of S E (Ω) and * S E (Ω). The dual of the exponential transport can be computed as follows. For p, q ∈ E (Ω), v ∈ * S q E (Ω), and

Now,
A•q A•p v ∈ * S p E (Ω); hence, the dual of the exponential transport is the mixture transport,

Velocity and Auto-Parallel Curves
The following computation is a version of the original argument about Fisher's score. Let t → q(t) ∈ E (Ω) be a one-dimensional parametric statistical model, namely a curve in geometric language. We assume the curve is smooth and twice differentiable as a mapping in the vector space L(Ω). For each random variable f ∈ L(Ω), The velocity of the curve is defined as .
We can check that The variation computed with the escort probability function, namely f − E q [ f ], appears in Equation (15)

as a gradient of the expectation
For q(0) = q 0 and q(1) = q 1 , Let us compute the auto-parallel curves for the exponential transport, e U q(s) q(t) q(t) = q(s) .
As observed above,

Surfaces of Constant Divergence
We have observed that an auto-parallel curve starting at q(0) with velocity q(0) has the form of Equation (18). For given extreme points q 0 = q(0) and q 1 = q(1), it holds that The velocity of the auto-parallel curve at q 1 is constant, Consider a curve γ starting at γ(0) = q 1 of the form, and assume a divergence is constant, precisely is constant so that the derivative is zero. In particular, it is zero at t = 0 That is, this surface of equi-divergence is orthogonal to the auto-parallel curves in the sense of the quadratic form d 2 K. This is actually the generalization of a well-known result in IG, where the Hessian of the cumulant function is the Fisher's information matrix. See, for example, [5].

Displacement
The machinery introduced above allows for explicitly defining the affine structure as originally defined by [29]. A textbook on affine geometry is ( [30] Ch. 2,3,9). Below, we call the following two (dual) displacements on the statistical bundle. The mixture displacement is The exponential displacement is Both displacements define affine coordinates in the statistical bundle. The easy proofs are the same as in the standard cases [7]. Each displacement defines an atlas of charts on the affine bundle.
The orthogonal surfaces of the affine exponential auto-parallel curves are discussed in the section above. The orthogonal surfaces to the affine mixture auto-parallel curves are easily observed to be associated with the other divergence. In fact, it is the classical result of the duality between the two divergences. See, for example, [5].
The availability of an affine bundle would allow for a coherent and straightforward definition of mechanical concepts such as velocity, acceleration, Lagrangian, and Hamilto-nian. See [31,32] for the standard case. In the present paper, we develop the application to CoDa, and we stress the notion of affine barycenter and the fact that a system of charts can be observed as a preprocessing of data to be followed by any method adapted to actual vector data.

Barycenter and Deviation
Let f 1 , . . . , f n be a sequence of CoDa points with strictly positive components and normalized to one. Each data point is a point in the open probability simplex. The affine coordinates (20) centered at p are . . .
The mean value of the affine coordinates is If the mean value computed in the centering q is s q , the difference is Hence, The probability function is the same in both cases. In fact, exp κ s p − K p s p + log κ (p) = exp κ s q − K p s p + log κ (q) + constant = exp κ s q − K q s q + log κ (q) because of the uniqueness of the normalizing constant.
In conclusion, the probability function with s p as Equation (21) does not depend on the reference p. It is the barycentre of the given data points. The displacement of each data point f j from the barycentre f is and the expression of each point f j in the barycentre f is A one-dimensional summary consistent with our formalism of the divergence of each point from the barycentre is the Kaniadakis' divergence D f f j , which is the normalising constant in the equation above. Another option is the Kaniadakis' divergence D f j f that appears in the representation of the barycentre in the data point f j .

Data Analysis
This section will use some geometric concepts derived from Kaniadakis' IG. It should be noted that our formalism is, in principle, affine and does not include any properly defined distance.

Kaniadakis Divergence
First, we compute the Kaniadakis divergence defined in Equation (5). Each point in (i, j) in Figure 1

Mixture Displacement
Equation (19) provides instructions for computing the mixture displacement. From Figure 2, the mixture displacement for Greece and Spain is very high for all the years. The value for Spain in 2009 was less than zero-the only negative value for Spain. On the contrary, all other countries do not have too many high values.
Equation (21) provides the mean value. After determining the mean, we compute the mixture displacement using the mean as a reference. We check that our values abruptly go from −10 to 10. However, the results for Greece and Spain decrease when the mixture displacement from the mean is calculated.

Exponential Displacement
As above, Equation (20) returns the exponential displacement. Further, Figure 3 is the empirical result of Equation (20). As for the mixture displacement, we can see that Spain and Greece have higher displacement than other European countries. The value for Spain in the year 2009 is meagre.
If the mean is the reference point, the exponential displacement ranges from 0 to −60.

Conclusions and Discussion
In this research, we applied a particular type of divergence, Kaniadakis divergence, to compositional data, aligned with the symmetrised ratio transformation in ([8] Example 4.20). The dataset being examined spans the years 2008 through 2021. First, we built a theoretical framework for Kaniadakis divergence, mixture displacement, and exponential displacement.
Section 1 provided the mathematical framework for determining divergence and displacement, while Section 2 demonstrated how to apply those mathematical algorithms to compositional data. In the application, we found that Spain and Greece have more fluctuations when compared to the other European countries. The values of the mixture and exponential displacement confirm that Spain and Greece faced some financial crises compared to other countries.
This simple application shows the potential of IG for application to compositional data analysis. We suggest that Kaniadakis' logarithm can reduce the computations for monitoring systemic risk to algebraic computations. The Kaniadakis logarithm, mixture, and exponential displacement on compositional data can be considered to broaden traditional research methods for compositional data analysis.
We would like to add a few words regarding the specific tools and formalism we used here. First, we mimicked one of the possible presentations of non-parametric IG by following the basic dually-flat setup step by step. Another successful presentation of non-parametric IG starts with properly defining the divergences and deriving the geometry; see, for example, [33]. A popular approach, not equivalent to the affine one, defines the geometry of the probability simplex by introducing a metric tensor. As in other geometric theories, one should carefully distinguish between choosing charts and introducing a topology.
In the present approach, we define the charts so that the associated manifold is affine; in this setup, some specific divergences appear as naturally associated with the geometry and the basic statistical notion, namely the pairing between measures and random variables. Everything is applied to simple data manipulation in the spirit of Aitchison's methods.
No claim of optimality is made. The existence of many different but topologically equivalent divergences is only natural in our setup, where the topology actually depends on the geometry and not the other way around. Whenever needed, a choice must be based on some additional assumption. We carefully check the simple, useful operations on data, such that the geodesic connecting two given points, the velocity of variation, the barycentre, and the deviation from the barycentre are all defined correctly.